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Abstract —Fractional repetition (FR) codes is a family of codes 
for distributed storage systems (DSS) that allow uncoded exact 
repairs with minimum repair bandwidth. In this work, we 
consider a bound on the maximum amount of data that can 
be stored using an FR code. Optimal FR codes which attain 
this bound are presented. The constructions of these FR codes 
are based on families of regular graphs, such as Turan graphs 
and graphs with large girth; and on combinatorial designs, such 
as transversal designs and generalized polygons. In addition, 
based on a connection between FR codes and batch codes, we 
propose a new family of codes for DSS, called fractional repetition 
batch codes, which allow uncoded efficient exact repairs and load 
balancing which can be performed by several users in parallel. 

I. Introduction 

In distributed storage systems, data is stored across a 
network of nodes, which can unexpectedly fail. To provide 
reliability, data redundancy based on coding techniques is 
introduced in such systems. Moreover, existing erasure codes 
allow to minimize the storage overhead. In Q Dimakis et al. 
introduced a new family of erasure codes, called regenerating 
codes, which allow efficient single node repairs. In particular, 
they presented two families of regenerating codes, called 
minimum storage regenerating (MSR) codes and minimum 
bandwidth regenerating (MBR) codes, which correspond to the 
two extreme points on the storage-bandwidth trade-off Q. An 
(n, k, d, a, j3)q regenerating code C, where k < d < n — 1, 
/3 < a, is used to store a file in n nodes; each node stores a 
symbols from F^, the finite field with q elements, such that the 
stored file can be recovered by downloading the data from any 
set of k nodes. When a single node fails, a newcomer node 
which substitutes the failed node contacts with a random set 
of d other nodes and downloads /? symbols of each node in 
this set to reconstruct the failed data. This process is called a 
node repair, and the amount of data downloaded to repair a 
failed node, /3d, is called the repair bandwidth. 

In p3) , Rashmi et al. presented a construction for 
MBR codes which have the additional property of exact repair 
by transfer, or exact uncoded repair. In other words, the 
{n,k,d = n — 1,M = ka — {^,Ci = n — l,/3 = 1) 
code proposed in allows efficient exact node repairs 

where no decoding is needed. Every node participating in a 
node repair process just passes one symbol which will be 
directly stored in the newcomer node. This construction is 
based on a concatenation of an outer MDS code with an inner 


repetition code based on a complete graph. El Rouayheb and 
Ramchandran GD generalized the construction of and 
defined a new family of codes for DSS which allow exact 
repairs by transfer for a wide range of parameters. These codes, 
called DRESS (Distributed Replication based Exact Simple 
Storage) codes pT) , consist of the concatenation of an outer 
MDS code and the inner repetition code called fractional 
repetition (ER) code. However, in contrast to MBR codes, 
where a random set of size d of available nodes is used for a 
node repair, the repairs with DRESS codes are table based. This 
usually allows to store more data compared to MBR codes. 

Constructions of ER codes based on some regular graphs 
and combinatorial designs can be found for example in 0, 
0, pg, fig. However, the optimality of the constructed ER 
codes regarding the FR capacity, i.e. the maximality of the size 
of the stored file, was not considered. 

In this work, we address the problem of constructing optimal 
FR codes and hence, optimal DRESS codes. Moreover, based 
on a connection between FR codes and combinatorial batch 
codes, we propose a new family of codes for DSS, called 
fractional repetition batch (FRB) codes, which enable uncoded 
repairs and load balancing that can be performed by several 
users in parallel. 

The rest of the paper is organized as follows. In Section |I^ 
we define DRESS codes and FR codes based on regular graphs 
and combinatorial designs. In Section III we present optimal 
FR codes based on Turan graphs and on graphs with large 
girth. In Section we consider optimal FR codes based on 
transversal designs and on generalized polygons. In Section [V| 
we define FRB codes and present some examples for their 
constructions. Conclusion is given in Section VI We point 
out that, throughout this paper, proofs are often omitted due to 
space limitations. Details of all the proofs can be found in p6). 


H. Preliminaries 

An {n,a,p) FR code C is a collection of n subsets 
Ni,..., Nn of [0]'=^{1, 2,... ,9}, na = p9, such that 

• \Ni \ = a for each i, 1 < i < w, 

• each symbol of [0] belongs to exactly p subsets in C, 
where p is called the repetition degree of C. 

A [{6,M),k,{n,a, p)] DRESS code is a code obtained by 
the concatenation of an outer {9, M) MDS code and an inner 
(n, a, p) FR code C. To store a file f G in a DSS, f is first 





Fig. 1: The encoding scheme for a DRESS code 


encoded by using the MDS code; next, the 9 symbols of the 
codeword Cf from the MDS code, which encodes the file f, are 
placed in the n nodes defined by C, as follows; node i G \n] 
of the DSS stores a symbols of Cf, indexed by the elements 
of the subset Ni. The encoding scheme for a DRESS code is 
shown in Eig. 

Each symbol of Ct is stored in exactly p nodes. It should be 
possible to reconstruct the stored file f of size M from any set 
of k nodes, and hence. 


M < min I Uig/ NA. 

|7| = fc 


( 1 ) 


Since we want to maximize the size of a file that can be 
stored by using a DRESS code, in the sequel we will always 
assume that M — min| 7 |^fe | Uig/ iVij. Note, that the same 
ER code can be used in different DRESS codes, with different 
k’s as reconstruction degrees, and different MDS codes. The 
file size M, which is the dimension of the chosen MDS code, 
depends on the value of k and hence in the sequel we will 
use M{k) to denote the size of the file. An {n,a,p) ER 


code is called universally good |15| if for any k < a the 
[(0, M(fc)), fc, (n, a, p)] DRESS code satisfies 


M(k) > ka — 


( 2 ) 


ER code k-optimal if a file stored by using this code is the 
maximum possible for the given k. We call an ER code optimal 
if for any fc < a it is fc-optimal. 

Let C be an {n,a,p) ER code. C can be described by an 
incidence matrix I(C'), which is an nx 0 binary matrix, 9 = 
whose rows indexed by the nodes and columns indexed by 
the symbols of the corresponding MDS codeword, such that 
= 1 if and only if node i contains symbol j. Note 
that every row of 1(C) has a ones and every column of 1(C) 
has p ones. 

Let C = (y, C) be an a-regular graph with n =\V\ vertices. 
We say that an (n, a, p = 2) ER code C is based on G if 
1(C) = 1(G), where 1(G) is the \V\ x \E\ incidence matrix of 
G. Such a code will be denoted by Ca- 

Let V = (VjB) be a design with |7^| = n points such that 
each block B G B contains p points and each point p G V 
is contained in a blocks. We say that an (n, a, p) ER code C 
is based on V if 1(C) = 1(D), where 1(D) is the \V\ x \B\ 
incidence matrix of D. Such a code will be denoted by Cu. 

III. Optimal ER Codes with Repetition Degree p = 2 

In this section we consider optimal ER codes with repetition 
degree 2. Eirst, we present the following useful lemma which 
shows a connection between the problem of finding the max¬ 
imum file size of an ER code based on a graph and the edge 
isoperimetric problem on graphs Q. 

Lemma 1. Let G = (V, E) be an a-regular graph and let 
Co be the FR code based on G. We denote by Gk the family 
of induced subgraphs of G with k vertices. Then the file size 
M{k) of Go is given by 

M{k) = ka— max IC'I. 

G'=(V',E')(iGk 

Proof. Eor each induced subgraph G' = {V',E') G Gk we 
define to be the set of all the edges of E in the cut between 
V and V \ V', i.e., 

= {{u, u}gE:vGV',uGV\ V'}. 

Clearly, ka = 2\E'\ + \E'^^^\ for every G' G Gk. Note that 
M{k) = minG'GGfc{|C'| + |C'm|} and hence 


where the righthand side of equation Q is the maximum file 
size (called MBR capacity) that can be stored using an MBR 
code 0. Note also that if an ER code C is universally good 
then C iVj| < 1, for Ni,Nj G C, i j G [n] p3| . In the 
sequel, we will consider only universally good ER codes. 

An upper bound on the maximum file size M{k) of a 
[(0, M(fc)), fc, (n, a, p)] DRESS code {na = p0), called the 
FR capacity and denoted in the sequel by A{n,k,a,p), was 
presented in p?j: 


A{n, fc, a, p) < ip{k), where ip(l) = a, 


(p(fc + 1) = <p(fc) + a 


pp(k) — ka 
n — k 


(3) 


Note that for any given fc, the function A{n,k,a,p) is deter¬ 
mined by the parameters of the inner ER code. We call an 


M{k)= min {|C'|-f afc — 2|£’'|} = afc — max {|C'|}. 

G'^Gk G'^Gk 

□ 

The following lemma directly follows from Lemma 

Lemma 2. Let G be an a-regular graph with n vertices, and 
let M{k) be the file size of the corresponding code Gq. The 
graph G contains a k-clique if and only if M{k) = ka — ( 2 ). 

Corollary 3. The file size M{k) of an FR code Gq, where G 
is a graph which does not contain a k-clique, is strictly larger 
than the MBR capacity. 

One of the main advantages of an ER code is that its file size 
usually exceeds the MBR capacity. Hence, as a consequence of 
Corollary we consider regular graphs which do not contain 
a fc-clique for a given fc. In particular, we consider a family 





























Fig. 2: The ((9, M{k)), k, (6, 3, 2)) DRESS code with the inner 
FR code based on the complete bipartite graph ^ 


of regular graphs, called Turdn graphs, which do not contain 
a clique of a given size and also have the smallest number of 
vertices Q. Let r, n be two integers such that r divides n. An 
{n,r)-Turdn graph is defined as a regular complete r-partite 
graph, i.e., a graph formed by partitioning the set of n vertices 
into r parts of size ^ and connecting each two vertices of 
different parts by an edge. Clearly, an (n, r)-Turan graph does 
not contain a clique of size r + 1 and it is an (r — 1) "-regular 
graph. 

The following theorem shows that FR codes obtained from 
Turan graphs attain the upper bound in ^ for all fc < a and 
hence they are optimal FR codes. The proof of this theorem 
follows from Lemma and by Turan’s theorem @ p. 58]. 


Theorem 4. Let T = {V, E) be an (n, r)-Turdn graph, r < n, 
a = (r — 1)^, and let k be an integer such that 1 < k < a. 
Then the (n,a,2) FR code Ct based on T has file size given 
by 


M(k) = ka 



(4) 


which attains the upper bound in (Ill- 

Note that an (n — l)-regular complete graph Kn is an 
(n, n)-Turan graph. Hence, the construction of MBR codes 
from |T3), fig can be considered as a special case of our 
construction of the DRESS codes with an inner ER code based 
on a Turan graph. Note also that an a-regular complete bipartite 
graph Ka,a is a (2a, 2)-Turan graph. The following example 
illustrates Theorem |g for such a graph. 

Example 1. The (6, 3, 2) FR code based on K 3 3 and its file 
size for 1 < fc < 3 are shown in Fig. 


The proof of the following lemma can be easily verified 
from Lemma [T] 


Lemma 5. Let C be an (n, a, 2) FR code. Then the file size 
M (k) of C for any 1 < k < a satisfies 


M[k) < ka — k 1. 


By Lemma [T] to obtain a large value for M{k), every 
induced subgraph with k vertices should be as sparse as 
possible. Hence, for the rest of this section we consider graphs 
where the induced subgraphs with k vertices, 1 < A: < a, will 
be cycle-free. These are graphs with girth at least k+1, where 
the girth of a graph is the length of its shortest cycle. 


Lemma 6. Let G be an a-regular graph with n vertices and 
let Mik) be the file size of the corresponding FR code Cq- The 
girth of G is at least fc-f 1 if and only if M{k) = ka — (k—l). 


Corollary 7. For each k < g—1, an FR code Cq based on an 
a-regular graph G with girth g attains the bound in (0. and 
hence it is k-optimal. Gg also attains the bound of Lemma ^ 

Corollary 8. An FR code Gg based on an a-regular graph G 
with girth g > a -\- \ is optimal. 

The proof of the following theorem follows from Lemma 
and the fact that any two cycles in a graph with girth g have 
at most [( 7 / 2 J -f 1 common vertices. 


Theorem 9 . If G is a graph with girth g, then the file size 
M{k) of an FR code Gg based on G satisfies 


M{k) = 


ka —fc-fl if k < g — 1 

ka — k if g ^ k < g \— 2 . 


A {d, g)-cage is a d-regular graph with girth g and minimum 
number of vertices. Let N{d,g) be the minimum number of 
vertices in a {d, g)-cage. A lower bound on N{d,g), known as 

if g is odd 
if g is even 

Lemma 10. The bound in 0 is not tight for p = 2 if 


Moore bound 0 p. 180], is given by 
noid,g) = i 

I ^Eiioid-^y 


ak — a — k-\-3<n< N{a, k -f 1). 


As a consequence of Lemma [T^ we have that the bound 
in 0 is not always tight and hence we have a similar better 
bound on A{n, k, a, p): 


A{n,k,a, p) < p'{k), where p'{!) = a, 


p'(k -f 1) = A(n, k, a,p) -\- a 


pA{n, k, a, p) — ka 
n — k 


IV. Optimal ER Codes with Repetition Degree p> 2 

In this section, we consider FR codes with repetition degree 
p > 2. Note, that while codes with p = 2 have the maximum 
data/storage ratio, codes with p > 2 provide multiple choices 
for node repairs. In other words, when a node fails, it can be 
repaired from different d-subsets of available nodes. 

We present generalizations of the constructions from the 
previous section which were based on Turan graphs and graphs 
with a given girth. These generalizations employ transversal 
designs and generalized polygons, respectively. 

A transversal design of group size h and block size £, 
denoted by TD(£, L) is a triple {V,Q,B), where 

1 ) 7^ is a set of ih points', 





















































2) ^ is a partition of V into ^ sets {groups), each one of 
size /i; 

3) S is a collection of ^-subsets of V (blocks)', 

4) each block meets each group in exactly one point; 

5) any pair of points from different groups is contained in 
exactly one block. 

The properties of a transversal design TD(£, h) which will be 
useful for our constructions are summarized in the following 
lemma 0 - 

Lemma 11. Let {'P,Q,B) be a transversal design TD{£,h). 
The number of points is given by \'P\ = ih, the number of 
groups is given by |f/| = i, the number of blocks is given by 
\B\ = hf, and the number of blocks that contain a given point 
is equal to h. 

Let TD be a transversal design TD(p, a), p < a + 1, with 
block size p and group size a. Let CVd be an (n, a, p) FR 
code based on TD (see Section E- By Lemma [m there are 
pa points in TD and hence n = pa. Note, that all the symbols 
stored in node i correspond to the set Ni of blocks from TD 
that contain the point i. Since by Lemma[^there are a blocks 
that contain a given point, it follows that each node stores a 
symbols. 

Theorem 12. Let k = bp + t, for integers b,t > 0 such that 
t < p — 1. For an (n = pa,a,p) FR code Ctd based on a 
transversal design TD{p, a) we have 

M{k) >ka- +^( 2 ) 

Remark 1. Note, that for all k > p + 1, the file size of the FR 
code Ctd strictly larger than the MBR capacity. 


Note that the incidence matrix of the transversal design 
TD(2, a) is equal to the incidence matrix of the (2a, 2)-Turan 
graph, and hence in this case Ctd = Ct. 


Example 2. Let TD be a transversal design TD{3, 4) defined as 
follows: -P = {1, 2,..., 12}; Q = {Gi, G 2 , G 3 }, where Gi = 
{1,2, 3,4}, G 2 = 15,6,7,8}, and G 3 = {9,10,11,12}; B = 
{Bi,B 2 ,...,Biq}, where Bi = {1,5,9}, B 2 = {1,6,10}, 
B 3 = {1,7,11}, B 4 = {1,8,12}, S 5 = {2,5,10}, Be = 
{2,6,9}, Br = {2,7,12}, Bg = {2,8,11}, Bg = {3,5,12}, 
Bio = {3,6,11}, Bn = {3,7,10}, Bn = {3,8,9}, 
Bi 3 = {4,5,11}, Bi 4 = {4,6,12}, B^ = {4,7,9}, and 
Bi6 = {4,8,10}. 

The placement of symbols from a codeword of the corre¬ 
sponding MDS code of length 16 is shown in Fig. The 
values of a file size M{k) for 1 < fc < 4 are given in the 
following table. ^_ 


k 

M{k) 

1 

4 

2 

7 

3 

9 

4 

11 


Remark 2. The conditions on the parameters of TD such that 


the bound on the file size of an FR code Ctd from Theorem 12 
attains the recursive bound in 0 can be found in m 


12 3 4 
Node 1 


5 6 7 8 
Node 2 


9 10 1112 
Node 3 

13 14 15 16 
Node 4 




Fig. 3; The (12,4,3) FR code based on TD(3,4) 


Similarly to an FR code Cq with p = 2 based on a graph G 
with girth g, one can consider an FR code Ggp based on a 
generalized 5 -gon (generalized polygon GP 0 ) for p > 2 . 
One can prove that the file size of Ggp is identical to the file 
size of Co for k<g-\-\^~\—2 given in Theorem lol However, 
a generalized 5 -gon is known to exist only for g S {3,4, 6 , 8 }. 
This observation also holds for a general biregular bipartite 
graph of girth 2 g, not only the incidence graph of a generalized 
polygon. 

Remark 3. Note that the problem of constructing FR codes 
with p > 2 also can be considered in terms of bipartite 
expander graphs (see e.g Let Gex = (L U R,E) be 
a bipartite expander and let Cex be the FR code such that 
the subset Ni, 1 < i < n, corresponds to the ith vertex in 
L and the symbol j, 1 < j < 0, corresponds to the jth 
vertex in R, \L\ = n and |i?| = 9. Then calculating M{k) 
can be described by calculating the number of neighbours of 
any subset of L of size k. In other words, for an FR code with 
file size M(k) it should hold that |r(A)| > M(k) for every 
Ac L of size k, where r(7l) denotes the set of neighbours 
of A. Hence, to have an FR code with file size M(k), one 
need to construct a (fc, expander graph, where is 

its expansion factor 0 - 

V. Fractional Repetition Batch Codes 

In this section we propose a new type of codes for DSS, 
called fractional repetition batch (FRB) codes, which enable 
uncoded efficient exact node repairs and load balancing which 
can be performed by several users in parallel. An FRB code 
is a combination of an FR code and an uniform combinatorial 
batch code. 

The family of codes called batch codes was proposed in 0 
for load balancing in distributed storage. A batch code stores 
9 (encoded) data symbols in n system nodes in such a way 
that any batch of t data symbols can be decoded by reading at 
most one symbol from each node. In a p-uniform combinatorial 
batch code, proposed in p^ , each node stores a subset of data 
symbols and no decoding is required during retrieval of any 
batch of t symbols. Each symbol is stored in exactly p nodes 
and hence it is also called a replication based batch code. A p- 
uniform combinatorial batch code is denoted by p—{9, N, t, n)- 






























































CBC, where N = pO is the total storage over all the n nodes. 
These codes were studied in i), d, d- 

Next, we provide a formal definition of FRB codes. This 
definition is based on the definitions of a DRESS code and a 
uniform combinatorial batch code. Let f G be a file of size 
M and let Cf S F® be a codeword of an {9, M) MDS code 
which encodes the data f. Let ..., A^„} be a collection 
of a-subsets of the set [0]. A p — {n,M,k,a,t) FRB code 
C, k < a, t < M, represents a system of n nodes with the 
following properties: 

1) Every node i, 1 < i < n, stores a symbols of Cf indexed 
by Ni, 

2) Every symbol of Cf is stored in p nodes; 

3) From any set of k nodes it is possible to reconstruct the 
stored file f, in other words, M = min|/|^fc | Uig/ Ni\-, 

4) Any batch of t symbols from Cf can be retrieved by 
downloading at most one symbol from each node. 

Note that the retrieval of any batch of t symbols can be 
performed by t different users in parallel, where each user 
gets a different symbol. 

In the following, we present our constructions of FRB codes 
which are based on the uniform batch codes from OD and (13 
and on FR codes considered in Sections [Inland HyI 


Theorem 13. 

1) If Ka ct is a complete bipartite graph with a > 2 , then 
a is a 2 — {2a, M, k, a, 5) FRB code with M = 


ka — 


[ 


2) If G is an a-regular graph on n vertices with girth g, 
then Cq is a 2 — (n, M, k, a, 2g — [|J — 1) FRB code 
with 


^ffca-fc + l if k<g-I 

\ ka-k if g < k < g + \^'] - 2. 

3) Let TD be a resolvable transversal design TD{a — 1, a), 
for a prime power a. Ctd is an (a — 1) — {a^ — 
a, M, k,a,a^ — a — 1) FRB code with M > ka — [ 2 ) + 
{a — 1 )( 2 ) + where x,y are nonnegative integers 
which satisfy k = x{a — 1) + y, y < a — 2. 


Example 3. 

• Consider the code Cks 3 based on K 3 2 , (see also Exam- 
ple^^or an FR code based on K 3 3 ). By Theorem 13 for 
fc = 3, Ck3 3 is a 2 — (6, 7,3, 3, 5) FRB code. 

• Consider the code Cjd based on the resolvable transver¬ 
sal design TD = TD{i,f) (see also Example ^for an 
FR code based on 77)(3,4)j. By Theorem \13\ for fc = 4, 
Ctd is a 3 — (12,11,4,4,11) FRB code, which stores 
a file of size 11 and allows for retrieval of any (coded) 
11 symbols, by reading at most one symbol from a node. 
In particular, when using a systematic MDS code, Ctd 
provides load balancing in data reconstruction. 


Remark 4. Similarly to FR codes, the problem of construct¬ 
ing for FRB codes can be considered in terms of bipartite 
expanders (see Remark [ 3 . The construction of batch codes 
based on (unbalanced) expander graphs was proposed in # 


To construct an FRB code, one need a bipartite expander with 
two different expansion factors, M {k) jk and 1, for two sides 
L and R of a graph, respectively. 

VI. Conclusion 

We considered the problem of constructing optimal FR codes 
and as a consequence, optimal DRESS codes. We presented 
constructions of FR codes based on Turan graphs, graphs with 
a given girth, transversal designs, and generalized polygons. 
Based on a connection between FR codes and batch codes, 
we proposed a new family of codes for DSS, FRB codes, 
which have the properties of batch codes and FR codes 
simultaneously. These are the first codes for DSS which allow 
uncoded efficient exact repairs and load balancing. 
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